61 research outputs found

    Bayes and maximum likelihood for L1L^1-Wasserstein deconvolution of Laplace mixtures

    Full text link
    We consider the problem of recovering a distribution function on the real line from observations additively contaminated with errors following the standard Laplace distribution. Assuming that the latent distribution is completely unknown leads to a nonparametric deconvolution problem. We begin by studying the rates of convergence relative to the L2L^2-norm and the Hellinger metric for the direct problem of estimating the sampling density, which is a mixture of Laplace densities with a possibly unbounded set of locations: the rate of convergence for the Bayes' density estimator corresponding to a Dirichlet process prior over the space of all mixing distributions on the real line matches, up to a logarithmic factor, with the n3/8log1/8nn^{-3/8}\log^{1/8}n rate for the maximum likelihood estimator. Then, appealing to an inversion inequality translating the L2L^2-norm and the Hellinger distance between general kernel mixtures, with a kernel density having polynomially decaying Fourier transform, into any LpL^p-Wasserstein distance, p1p\geq1, between the corresponding mixing distributions, provided their Laplace transforms are finite in some neighborhood of zero, we derive the rates of convergence in the L1L^1-Wasserstein metric for the Bayes' and maximum likelihood estimators of the mixing distribution. Merging in the L1L^1-Wasserstein distance between Bayes and maximum likelihood follows as a by-product, along with an assessment on the stochastic order of the discrepancy between the two estimation procedures

    Bayesian adaptation

    Full text link
    In the need for low assumption inferential methods in infinite-dimensional settings, Bayesian adaptive estimation via a prior distribution that does not depend on the regularity of the function to be estimated nor on the sample size is valuable. We elucidate relationships among the main approaches followed to design priors for minimax-optimal rate-adaptive estimation meanwhile shedding light on the underlying ideas.Comment: 20 pages, Propositions 3 and 5 adde

    Empirical Bayes conditional density estimation

    Full text link
    The problem of nonparametric estimation of the conditional density of a response, given a vector of explanatory variables, is classical and of prominent importance in many prediction problems since the conditional density provides a more comprehensive description of the association between the response and the predictor than, for instance, does the regression function. The problem has applications across different fields like economy, actuarial sciences and medicine. We investigate empirical Bayes estimation of conditional densities establishing that an automatic data-driven selection of the prior hyper-parameters in infinite mixtures of Gaussian kernels, with predictor-dependent mixing weights, can lead to estimators whose performance is on par with that of frequentist estimators in being minimax-optimal (up to logarithmic factors) rate adaptive over classes of locally H\"older smooth conditional densities and in performing an adaptive dimension reduction if the response is independent of (some of) the explanatory variables which, containing no information about the response, are irrelevant to the purpose of estimating its conditional density

    Convergence rates for Bayesian density estimation of infinite-dimensional exponential families

    Full text link
    We study the rate of convergence of posterior distributions in density estimation problems for log-densities in periodic Sobolev classes characterized by a smoothness parameter p. The posterior expected density provides a nonparametric estimation procedure attaining the optimal minimax rate of convergence under Hellinger loss if the posterior distribution achieves the optimal rate over certain uniformity classes. A prior on the density class of interest is induced by a prior on the coefficients of the trigonometric series expansion of the log-density. We show that when p is known, the posterior distribution of a Gaussian prior achieves the optimal rate provided the prior variances die off sufficiently rapidly. For a mixture of normal distributions, the mixing weights on the dimension of the exponential family are assumed to be bounded below by an exponentially decreasing sequence. To avoid the use of infinite bases, we develop priors that cut off the series at a sample-size-dependent truncation point. When the degree of smoothness is unknown, a finite mixture of normal priors indexed by the smoothness parameter, which is also assigned a prior, produces the best rate. A rate-adaptive estimator is derived.Comment: Published at http://dx.doi.org/10.1214/009053606000000911 in the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

    On asymptotically efficient maximum likelihood estimation of linear functionals in Laplace measurement error models

    Get PDF
    Maximum likelihood estimation of linear functionals in the inverse problem of deconvolution is considered. Given observations of a random sample from a distribution P0PF0P_0\equiv P_{F_0} indexed by a (potentially infinite-dimensional) parameter F0F_0, which is the distribution of the latent variable in a standard additive Laplace measurement error model, one wants to estimate a linear functional of F0F_0. Asymptotically efficient maximum likelihood estimation (MLE) of integral linear functionals of the mixing distribution F0F_0 in a convolution model with the Laplace kernel density is investigated. Situations are distinguished in which the functional of interest can be consistently estimated at n1/2n^{-1/2}-rate by the plug-in MLE, which is asymptotically normal and efficient, in the sense of achieving the variance lower bound, from those in which no integral linear functional can be estimated at parametric rate, which precludes any possibility for asymptotic efficiency. The n\sqrt{n}-convergence of the MLE, valid in the case of a degenerate mixing distribution at a single location point, fails in general, as does asymptotic normality. It is shown that there exists no regular estimator sequence for integral linear functionals of the mixing distribution that, when recentered about the estimand and n\sqrt{n}-rescaled, is asymptotically efficient, \emph{viz}., has Gaussian limit distribution with minimum variance. One can thus only expect estimation with some slower rate and, often, with a non-Gaussian limit distribution

    On asymptotically efficient maximum likelihood estimation of linear functionals in Laplace measurement error models

    Full text link
    Maximum likelihood estimation of linear functionals in the inverse problem of deconvolution is considered. Given observations of a random sample from a distribution P0PF0P_0\equiv P_{F_0} indexed by a (potentially infinite-dimensional) parameter F0F_0, which is the distribution of the latent variable in a standard additive Laplace measurement error model, one wants to estimate a linear functional of F0F_0. Asymptotically efficient maximum likelihood estimation (MLE) of integral linear functionals of the mixing distribution F0F_0 in a convolution model with the Laplace kernel density is investigated. Situations are distinguished in which the functional of interest can be consistently estimated at n1/2n^{-1/2}-rate by the plug-in MLE, which is asymptotically normal and efficient, in the sense of achieving the variance lower bound, from those in which no integral linear functional can be estimated at parametric rate, which precludes any possibility for asymptotic efficiency. The n\sqrt{n}-convergence of the MLE, valid in the case of a degenerate mixing distribution at a single location point, fails in general, as does asymptotic normality. It is shown that there exists no regular estimator sequence for integral linear functionals of the mixing distribution that, when recentered about the estimand and n\sqrt{n}-rescaled, is asymptotically efficient, \emph{viz}., has Gaussian limit distribution with minimum variance. One can thus only expect estimation with some slower rate and, often, with a non-Gaussian limit distribution

    Bayes and empirical Bayes: do they merge?

    Full text link
    Bayesian inference is attractive for its coherence and good frequentist properties. However, it is a common experience that eliciting a honest prior may be difficult and, in practice, people often take an {\em empirical Bayes} approach, plugging empirical estimates of the prior hyperparameters into the posterior distribution. Even if not rigorously justified, the underlying idea is that, when the sample size is large, empirical Bayes leads to "similar" inferential answers. Yet, precise mathematical results seem to be missing. In this work, we give a more rigorous justification in terms of merging of Bayes and empirical Bayes posterior distributions. We consider two notions of merging: Bayesian weak merging and frequentist merging in total variation. Since weak merging is related to consistency, we provide sufficient conditions for consistency of empirical Bayes posteriors. Also, we show that, under regularity conditions, the empirical Bayes procedure asymptotically selects the value of the hyperparameter for which the prior mostly favors the "truth". Examples include empirical Bayes density estimation with Dirichlet process mixtures.Comment: 27 page

    Wasserstein convergence in Bayesian and frequentist deconvolution models

    Get PDF
    We study the multivariate deconvolution problem of recovering the distribution of a signal from independent and identically distributed observations additively contaminated with random errors (noise) from a known distribution. For errors with independent coordinates having ordinary smooth densities, we derive an inversion inequality relating the L1-Wasserstein distance between two distributions of the signal to the L1-distance between the corresponding mixture densities of the observations. This smoothing inequality outperforms existing inversion inequalities. As an application of the inversion inequality to the Bayesian framework, we consider 1-Wasserstein deconvolution with Laplace noise in dimension one using a Dirichlet process mixture of normal densities as a prior measure on the mixing distribution (or distribution of the signal). We construct an adaptive approximation of the sampling density by convolving the Laplace density with a well-chosen mixture of normal densities and show that the posterior measure concentrates around the sampling density at a nearly minimax rate, up to a log-factor, in the L1-distance. The same posterior law is also shown to automatically adapt to the unknown Sobolev regularity of the mixing density, thus leading to a new Bayesian adaptive estimation procedure for mixing distributions with regular densities under the L1-Wasserstein metric. We illustrate utility of the inversion inequality also in a frequentist setting by showing that an appropriate isotone approximation of the classical kernel deconvolution estimator attains the minimax rate of convergence for 1-Wasserstein deconvolution in any dimension d≥1, when only a tail condition is required on the latent mixing density and we derive sharp lower bounds for these problems
    corecore